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Abstract 

As part of optimizing the reliability, Thales Optronics now includes systems that examine 
the state of its equipment. The aim of this paper is to use hidden Markov Model to detect 
as soon as possible a change of state of optronic equipment in order to propose maintenance 
before failure. For this, we carefully observe the dynamic of a variable called "cool down time" 
and noted Tmf, which reflects the state of the cooling system. Indeed, the Tmf is an indirect 
observation of the hidden state of the system. This one is modelled by a Markov chain and the 
Tmf is a noisy function of it. Thanks to filtering equations, we obtain results on the probability 
that an appliance is in degraded state at time t, knowing the history of the Tmf until this 
moment. We have evaluated the numerical behavior of our approach on simulated data. Then 
we have applied this methodology on our real data and we have checked that the results 
are consistent with the reality. This method can be implemented in a HUMS (Health and 
Usage Monitoring System). This simple example of HUMS would allow the Thales Optronics 
Company to improve its maintenance system. This company will be able to recall appliances 
which are estimated to be in degraded state and do not control to soon those estimated in 
stable state. 



1 Introduction 

Thales Optronics aims to optimize the ratio availability - cost. The company wants to reduce 
the failure rate of these appliances by the evolution of its maintenance concept which passes from 
a logic of repair to a logic of anticipation of these defects. As part of optimizing the reliability, 
Thales Optronics now includes systems that examine the state of its equipment. This function is 
performed by HUMS (Health and Usage Monitoring System) . The role of HUMS is : 

1. to record environmental conditions and use of equipment, 

2. to evaluate the state of the system, 

3. to anticipate and alert about the excesses of operation, 

4. to optimize maintenance operations. 

Our approach comes within a specific context. In this paper, we focus on point 2. We have at our 
disposal a variable that reflects the state of the system and we want to detect a change in mode of 
this variable (which is a change of slope in our case). There exist different methods for this kind 
of detection as the CUSUM, presented for instance by Basseville & al in [5_. But in this paper we 
focus on hidden Markov chains to detect this change of mode. The state of our system at time 
t is then modeled by a Markov chain Xt- In our case we do not observe directly this chain but 
indirectly through the Tmf variable, a noisy function of this chain. We will see in this paper how 
we can address this issue by using filtering theory. 



For this, we will first introduce the industrial problem in section 2 and the mathematical 
model in a general case in section 3. Section 4 presents a simulation study and section 5 the 
implementation of the methodology on our real data. 

2 Industrial problem 

Each of the appliances has a logbook which provides the following information at each start-up: 
number of uses, cumulative operating time of appliance, initial temperature and the "cool down 
time" (Tmf for "temps de mise en froid" in french). This Tmf is the transit time for the system from 
ambient temperature to a very low one. This temperature decrease is required to operate appliance 
and this is done on every boot. According to experts, a Tmf increase results from deterioration in 
the system. According to this hypothesis, a careful observation of Tmf evolution would allow us 
to determine the state of the system and prevent the breakdown. So we will look at the evolution 
of Tmf which seems to be a good indicator of the system state. 
We suppose that the system has three possible states: 

• Stable state: Tmf is constant. This reflects a system in good working order. There is no 
anomaly to report. 

• Degraded state: the Tmf increases. This reflects a speciflc deterioration in the system. 

• Failure: the system is stopped. 

Appliances move from stable state to degraded state, to reach failure. It is important to detect 
the beginning of a degradation to prevent as soon as possible occurrence of failure. As explained 
above, we would not observe directly the state of our system but indirectly through the Tmf. Our 
objectives are: 

• to estimate at every moment the state of the system by the evaluation of the probability of 
being in degraded state knowing the history of Tmf until this moment, 

• to detect as soon as possible the degradation of the system for a maintenance action before 
failure. 

To solve this problem, we use hidden Markov chains. 

3 Hidden Markov model 

In this section, we provide a general mathematical framework to tackle our problem. We have to 
detect a rupture in the behavior of the variable Tmf. There exist different methods for this kind of 
detection (see for instance Basseville and Nikiforov [5]). We choose to use Hidden Markov Model 
(HMM). HMM are frequently used to detect point mutations in DNA in genomics (see for instance 
Fridlyand [3]) or in speech recognition (see Rabiner [3]). In the domain of reliability, HMM are 
also used in a context of high frequencies data (see Wang [B]). In our context, the size of our data 
is not large (28 appliances, maximum 400 recordings in a logbook). But in the following, we will 
see that this tool is also powerful in our context. We first present the model in a general case and 
the estimation of the parameters of interest . 

3.1 Modeling 
3.1.1 Main process 

Consider (X()t>o a Markov chain in continuous time, defined on a probability space (fi, F, P) 
with discrete state space S={ei, 62, . . . , eA?} G K^. So Xt — {X} , . . . ,X^) is a vector of M^. 
For convenience, we follow Elliott's assumptions |2 and we set = (0, 0, . . . , 1, 0, . . . , 0) so that 
(ei, 62, . . . , ejv) is an orthonormal basis of M^. 

Let us denote the probability p\ = P{Xt = ei) for < i < iV and pt = {p\, . . . ,p^)- The 
motion of the chain Xt depends on A = (ay), the Q-matrix of the process (see C.Cocozza-Thivent 
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[T] for definition). The vector pt is linked to matrix A by the Kolmogorov equation ^ = A and 
Xt has the semimartingale representation: 

Xt=Xo+ f AXrdr + Vt (1) 
Jo 

with Vt a martingale. 

3.1.2 Observation process 

Xt is not directly observed, but through the process Yj given by the formula: 



Yt= f c{Xr)dr + Wt 
Jq 



(2) 



with: 

• {Wt)t>o a standard Brownian motion on {fl, F, P) independent of {Xt)t>o, 

• c{Xt) =< Xt; c > where <; > is the scalar product in and c = (ci, . . . , cjv) e M^. 

So, in mean, the increase of the observed process Yt depends on the state of Xt through c{Xt). A 
Brownian noise is added to the slope c{Xt). 
Let us denote: 

• (3^t)t>o the right-continuous complete filtration generated by a{Ys '■ < s <t), 

• iGt)t>o the right-continuous complete filtration generated by (j{Xs, Yg : < s < t). 

Recall that our aim is to determine the probability of the system to be in a particular state 
knowing the trajectories of Y until t. The best L^-approximation of this quantity is given by the 
conditional probability pl = P{Xt = |3^f ) for < i < A^. Note that 

PiXt = e, \yt ) = Pixi ^i\yt)^E [xi\yt] = {e [Xt\yt]), . 

So we have to compute the iV-dimensional conditional expectation i?[A"t|3^t]. This is the aim of 
the next section. 

3.2 Filtering equations and parameters estimation 

First, we give filtering equations which provide conditional expectations of functions of Xt, knowing 
the story of Yt- Then we will see how these equations allow us to estimate parameters A, c and 
the probability of being in a state given yt . 

3.2.1 Filtering equations 

Elliott [2J gives unnormalized filtering equations of different fonctionnals of Xt- To write these 
equations, let us denote a{F{Xs,s < t)) = E[AtF{Xs, s < t) \yt] with P and At associated with 
the absolutely continuous probability of change: 



dP 
dP 



= At = exp ( f < c;Xr > dYr - 7: [ < c; Xr dr 



This change of probability is a standard method in filtering because under P, Yt is independent of 
Xt- Under P, the dynamic of unnormalized filter satisfies stochastic differential equation. 
Filtering equations are about: 



state of the system: 



ft ft 
a{Xt)^a{Xo)+ I Aa{Xr)dr+ / Ca{Xr)dYr, 



^0 
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• number of jumps from to ej in the time interval [0,t\ denoted 

a{4'Xt)= f <cj{Xr)\e,> aj,edr+ I A(j{4' Xr)dr + f Ca{4^Xr)dYr, 
Jo Jo Jo 

• waiting time in state on the interval [0,t\ denoted iJJ: 

a{'diXt)^ f <(T{Xr);e,> eidr+ f Aa{^lXr)dr + f C(j{^lXr)dYr, 
Jo Jo Jo 

• drift defined by T/' = /J < X^; > dYr. 

uiTlXt) ^c, f a{< Xr, e, > e^)dr+ f Aa{T^Xr)dr+ f [< a(X^); e, > e,; + Ca(r;X^)] dF^, 
Jo Jo Jo 

These equations about <;*^ , are useful for the estimation of A and c when Yt is observed 

in a long time. 

3.2.2 Estimation 

Maximum likelihood estimation of A and c leads to the estimators: 

and a,:W = ^. (3) 

These estimators converge with t according to R.J. Elliott [2]. Using filtering equations and the 
estimators of A and c, it is possible to compute the estimated probability of the system to be in 
state Ci thanks to the following formula: 

E[X,\yt]^'^. (4) 
ct(1) 

Indeed, P{Xt = e,|3^t) = {E[Xt\yt]h = {^)^■ 

Note that filtering equations don't give directly <t(c,1^), a{TJ:), a{'dl) and cr(l) but cr{(;l-'Xt), 
a{TlXt), cr{d\Xt) and a{Xt). To pass from one to another, we just have to multiply these elements 
by vector (1,1,..,!)-^ to obtain a{4''), <j{Tl), (j('dl) and <t{1). Indeed, thanks to assumptions on 

e (ei,...,e^), = 1. 

Let us now illustrate this approach on simulated data. 



4 Simulation study 

In this section, the framework is the following. We assume that the process has two possible states: 
first Xt = ei and second Xt = ei with transitions from ei to 62 and conversely. For instance, 
ei (respectively 62) could correspond to the stable state (respectively the degraded state). So Xt 
oscillates between these two states. 

4.1 Probability estimation of being in a degraded state 

We first suppose that we know matrix A and vector c. The component a\2 of A is the parameter 
of the exponential distribution of the time in stable state (before degraded state) and 021 is the 
parameter of the exponential distribution of the time in degraded state. Using these values, we 
can simulate Xt- Then, using values of c and Xt- we simulate Yt thanks to equation (|2| and 
Euler scheme approximation to simulate stochastic differential equations. Now that our data are 
simulated and our parameters known we can compute the conditional probability that the system is 
in degraded state. For this, we use equation Q for the computation of the conditional probabilities 
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pI and pI — 1 — pI- This computation is made again by a recursive algorithm that uses the Euler 
scheme to approximate stochastic differential equations. 

An illustration of the good numerical behavior of the computational process is given in Figure [l] 
This figure zooms on a part of the trajectory of and pf = E \^X^ |3^t ] . We clearly observe that 
the filter correctly provides the evaluation of that is close to 1 (respectively 0) when X^ — 1 
(respectively when Xf = 0). 




145 
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Figure 1: Evolution of Xf and estimation of conditional probability P (^Xf = l\yt] 



Note that in this simulation study, we assume that parameters A and c are known. This is not 
the case in practice and these parameters must be estimated before estimating probability pi and 
Pi 

4.2 Parameters estimation 

4.2.1 Estimation of matrix A and vector c 

With simulations of process Xt in a long time, it is possible to use formula (|3| to estimate param- 
eters A and c. 

We first suppose vector c known and we seek to estimate the matrix A from observations of 
Yt. However, one difficulty of this estimation step is the fact that cr(^j-') and cr(0J) are governed 
by A. So we developed an iterative algorithm to approximate A starting with an arbitrary An. 
operating in the following way: at step k, we use Ak^i to compute A^ via filtering equation (l3|). 
The convergence of this estimator has been proved by Zeitouni and Dembo 

Now we assume matrix A known and we seek to estimate vector c from observations of Yf . For 
this, we also use formula (|3|. Once again, u{Tl) and (t(6'J) are governed by c itself. So, by the 
same method as previously, we developed again an iterative algorithm to approximate c starting 
with an arbitrary vector cq. 

4.2.2 Sensitivity of filter P{Xt = 62 |3^f ) to parameters A and c 

Since the values of A and c are unknown in practice, it seems important to study the impact of a 
poor estimation of A and c in calculation of probability P{Xt — 62 |3^t ). We simulate Xt and Yt 
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for given values of A and c: A = q'q 05^ ^^^^ c = (— 1; 1). We then estimate probability 

of being in a degraded state. 

I„ a fl.t *p, we c„„.d„ deviat.o„ fr„„ the true „.t„x A: A, ^ ?i 

^' \^ 0.04 -0.04 

f—0.2 0.2 \ . 

and ^2 = n no A no ■ With these two matrices, we again compute probability of being in 
yO.Oo — O.Oo/ 

a degraded state. Figure [2] gives these estimations. We clearly observe that deviations do not 
severely impact on the probability of interest. 




Figure 2: Evolution of probability of being in a degraded state for different matrices A 



In a second step, we estimate this probability with deviations from the true value c: Ci = 
(-0.5,0.5), C2 = (-1,0.5) et C3 = (0,1). Figure [s] gives these corresponding estimations. Again we 
observe that deviations do not severely impact the probability of interest. 

In our real data study we do not estimate parameters A and c by the method given in section 
4.2.1, indeed the process is stopped at its first transition to a degraded state and we do not 
observe Xt in long time with many transitions as in the simulation study. But from the previous 
simulations, we have noticed that a misspecification of these parameters does not seem to strongly 
impact on the filter value P{Xt — 62 |3^t). 



5 Application to industrial case 
5.1 Data 

We have logbooks of 28 appliances: five of them failed at the end of the study, due to a mechanical 
malfunction in the cooling system. For other appliances, the failures were not mechanical and are 
considered to be unpredictable (not related to a degradation effect and often due to an electronic 
failure). From the logbooks, we recover Tmf value and initial temperature at each startup of the 
system. The time unit of the model is the number of startups and we assume a common model 
for all appliances {A and c are the same for all of them) and the motion of the 28 appliances are 
mutually independent. 
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Figure 3: Evolution of probability of being in a degraded state for different vectors c 



5.2 Preliminary data processing 

The two variables, Tmf and initial temperature are linked together. Indeed, a high (resp. low) 
initial temperature increases (resp. decreases) the Tmf. So it was necessary to correct this crude 
Tmf by a standard linear regression according to initial temperature of appliance. We use this 
regression to bring the Tmf to a setting where initial temperature is constant and equals 10°C. 
This corrected Tmf is denoted by Tm/,. in the following. In Figure [4] we provide the corrected Tmf 
evolution of one appliance. We can see a very noisy phenomenon. Down peaks may be the result 
of "on/ofF/on" too brutal for appliance: the system is on, turned off and back on instantly so that 
initial temperature remains low. To soften this phenomenon, we decide to smooth the corrected 
Tmf (Tmfr). For this, we compute a moving average of Tmfr as follows: 



Tmfi{j) = 



20 



where Tmfr{i) is the value of corrected Tmf at the i startup. Let us denote by Tmfi the 
smoothing correcting Tmf value. In our modeling, we set = Tmfi{t). A theoretical interest of 
this smoothing step is that the filtering method works well with a not too noisy signal. Note that 
the Tmf I starts at the 20*'' startup because it is necessary to have 20 Tmfr to compute Tmfi. In 
practice it is necessary to wait 20 startups before the first computation of the probability of being 
in a degraded state. In Figure |4] we plot the evolution of smooth corrected Tmf of this appliance. 

We can notice on the bottom graph of Figure |4] that Tmfi remains constant for a while and 
then gradually increases. This change in slope was not obvious in the top graph of Figure |4] This 
is another interest of the smoothing step. Now, from these Tmfi values, we are able to calculate 
probability of being in degraded state. For this, we first need to estimate parameters A and c. 



5.3 Estimation of parameters A and c 

The estimation method presented in section 4.2.1 using the observation of the process in a long 
time is not possible here. Indeed the real system does not oscillate between two states because it 
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Figure 4: Evolution of corrected Tmf over uses at 10° C and smoothing data associated 



is stopped at its first transition to degraded state. Then we propose a practical choice for c and A 
mixing estimation using the data from the 28 appliances and expert opinions. 

According to experts, slope of smoothed curve is close to when the system is in stable state 
and it is strictly positive when it degrades. In addition, according to graphs of the evolution of 
the Tmf I, the slope is close to 1 when the system is in a degraded state. So we can naturally set 
5 =(0,1). 

About the Q-matrix A — ( "^^^ ) , ai2 is the parameter of the exponential distribution 

\^ a2i —021 / 

of the time in stable state (before degraded state). We have estimated this parameter using our 
data (28 appliances: 5 times of failure and 23 censures). By standard survival method taking 
censures into account, we have first estimated ai2 by In order to detect as soon as possible 

a chan ge of s tate (contraint requested by Thales) and according to our study of sensitivity (see 
Section 4.2.21, we chose to put a value 10 times greater: 0,12 = j^. 

The coefficient 021 should equal zero because system in degraded state can not return to a 
stable state. But in our equations, our filter P{Xt = 62 |3^t) must be versatile, so we have chosen 
a small value 0,21 — With this choice, the chance that an appliance in degraded state comes 

to stable state is very small. 

Now, we are able to estimate probability of interest. 
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5.4 Results 



With this choice for A and c and using the filtering equation (4), we computed the probability of 
being in a given state, at each startup t, knowing the story We first consider appliance noted 
Eh- A posteriori, we can see that Eh was trouble-free during its whole history. Figure |5] gives 
the evolution of its Tmfi. At each time t, we compute its probability of being in degraded state 
through ^ using values of Tmfi before t. We clearly observe a Tmfi quite constant during uses 
and a probability of being in a degraded state close to zero. 




Number of startups 



Probabilih- °^ - 
of being id 
degraded 
state. 

0.3 - 

D.2 - 
0.1 - 
Q — 



Number of startups. 



Figure 5: Smoothing data and evolution of probability to be in a degraded state for E^ 



Now, we consider an appliance denoted Ei^. A posteriori, we see that Eij degrades and breaks 
down at the end of the study. In Figure |6] we see a Tmfi quite constant during the first uses; 
then, Tmfi increases and then decreases to return to starting level. Finally, we notice an abrupt 
rise of Tmfi ■ Simultaneously, we note that the computed probability of being in degraded state is 
very low when Tmfi is constant and then sharply increases with Tmfi to one. 

To conclude, these two examples illustrate a good numerical behavior of the proposed approach. 
Now we are interested in a decision criterion that allows us to detect as soon as possible a degraded 
state in order to return appliances to perform maintenance action before failure. 

6 Decision criterion 



The increase of the probability of being in a degraded state is not sufficient to detect a future 
failure. We have to propose a decision criterion for maintenance. For this, we have tested different 
rules based on the fact that probability has to cross a threshold during a number of consecutive 
uses. We have tried different thresholds combined with different numbers of crossing. At each 
time, we have recorded the number of false and good detections. It is important to limit both 
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Figure 6: Smoothing data and evolution of probability to be in a degraded state for Ed 



false-positive and false-negative detections. According to the comparison of these rules, we have 
chosen the following criterion: when the probability to be in degraded state equals 1 over a period 
of three uses, the appliance is sent back for maintenance. 

We applied this rule on our 28 appliances and we obtain the results presented in Table [T] 
The decision criterion provides 26 good detections over 28. It does not provide false detection: 
the 23 appliances without observed failure were not detected as degraded. For the five appliances 
that failed during the study, the criterion identifies three of them as degraded (before failure). We 
suppose that for the two appliances which have not been correctly identified, the failure may be not 
related to a degradation effect of the cooling system and then can not be detected by our proposed 
approach. 



Decision criterion 


Observed failure 


No observed failure 


Futur failure detected 


3 





Futur failure not detected 


2 


23 


Total 


5 


23 



Table 1: Results obtained with the decision criterion 
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7 Concluding remarks 

Using this model, Thales is now working to implement in its HUMS in operating system a new 
maintenance algorithm. 

There are two technical solutions according to the system embedded calculator: 

• system capitalizes data, assesses and provides information about the cooler state, 

• system capitalizes data but the cooler state is assessed by a maintenance laptop plugged 
periodically on its maintenance socket. 

This model will allow us to improve the maintenance and the usage policies of monitored system. 
The improvements are: 

• moving from a preventive or corrective maintenance to a predictive maintenance, this evolu- 
tion allows to reduce the support cost, 

• ability to create a degraded operational mode, 

• increase the mission success probability (systems will be chosen according to their real status 
for critical mission). 

The performance of the new maintenance policy is possible thanks to combination of mathe- 
matical, high technology and new maintenance organization. 
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